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Abstract. We consider the problem of providing optimal uncertainty quan- 
tification (UQ) — and hence rigorous certification — for partially-observed 
functions. We present a UQ framework vfithin which the observations may be 
small or large in number, and need not carry information about the proba- 
bility distribution of the system in operation. The UQ objectives are posed 
as optimization problems, the solutions of which are optimal bounds on the 
quantities of interest; we consider two typical settings, namely parameter sen- 
sitivities (McDiarmid diameters) and output deviation (or failure) probabili- 
ties. The solutions of these optimization problems depend non-trivially (even 
non-monotonically and discontinuously) upon the specified legacy data. Fur- 
thermore, the extreme values are often determined by only a few members of 
the data set; in our principal physically-motivated example, the bounds are 
determined by just 2 out of 32 data points, and the remainder carry no infor- 
mation and could be neglected without changing the final answer. We propose 
an analogue of the simplex algorithm from linear programming that uses these 
observations to offer efficient and rigorous UQ for high-dimensional systems 
with high-cardinality legacy data. These findings suggest natural methods for 
selecting optimal (maximally informative) next experiments. 



1. Introduction and Outline 

1.1. Introduction. In many settings — including the physical sciences, engineer- 
ing, and finance — it is necessary to have a rigorous and also sharp/optimal quan- 
titative understanding of the efi'ects of uncertainties, which are often probabilistic 
in nature. Often, the available information about the system of interest comes 
in the form of legacy data, i.e. a data set that is provided "as is" and cannot be 
extended; the reasons for such restrictions may range from financial or practical 
difficulties to legal and ethical concerns. Uncertainty quantification (UQ) methods 
for addressing such problems must cope with this non-extensibility, the fact that 
the distribution of the legacy data may be unrelated to the probability distribution 
of the system in operation, and that the data set may be either very sparse or very 
large compared to the system's domain of operation. This paper approaches the 
UQ-with-legacy-data problem using the Optimal UQ framework proposed in [ 
and thereby develops and illustrates that general framework in a specific setting. 
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In the Optimal UQ framework [_ i ] , UQ in the presence of both epistemic and 
aleatoric uncertainties [16, 25, 29] is posed as an optimization problem over all fea- 
sible scenarios that are consistent with the available information about the input 
uncertainties — those uncertainties may be infinite-dimensional in nature, and con- 
cern unknown or partially-known probability distributions and functions. In many 
cases, the corresponding infinite-dimensional optimization problem can be reduced 
to an equivalent finite-dimensional problem that allows for closed-form or numerical 
evaluation [27, §3]. 

Many UQ methods are not directly applicable if the available data are of legacy 
type. For example, in [17], it was proposed that rigorous certification of phys- 
ical systems be performed using a concentration-of-measure inequality known as 
McDiarmid's inequality [lb, 19, 20], also known as the bounded differences inequal- 
ity. However, this method and its variants [I, 13, 33] require extensive data "on 
demand" in order to compute the McDiarmid diameter, which measures the system 
output variability and provides the concentration rate in McDiarmid's inequality. 
Section 3 of the present paper shows how the McDiarmid diameter of a Lipschitz 
function can be optimally bounded using legacy data observations of that function 
and (upper bounds on) the Lipschitz constants. 

More strongly, in Section 4 shows how to calculate optimal upper bounds on the 
probability of deviations from the mean (or any linear function of the system's a 
priori unknown probability distribution) given the legacy data and (upper bounds 
on) the Lipschitz constants; this second approach forms part of a large and growing 
body of work concerning the calculation of optimal inequalities in probability the- 
ory — see e.g. [3, 5, 27] for some surveys and historical remarks on this topic. We 
find that the extremizers for our optimization problems tend to have a very sim- 
ple, low-dimensional, singular structure. Furthermore, once this singular structure 
has been observed, even approximately, it can be exploited to greatly reduce the 
computational burden; see Remark 7.2 and Figure 7.4. 

It is also shown that, in certain cases, additional information (in the form of 
new observations) may not propagate to the resulting bounds, or, dually, that the 
bounds may be determined by a relatively small "active" subset of a large data 
set. In Algorithm 5.5 we propose an analogue of the simplex algorithm in linear 
programming that uses these observations to offer efficient and rigorous UQ for 
high-dimensional systems with high-cardinality legacy data. The motivating idea 
for this algorithm is to solve easier (less constrained) optimization problems when 
possible, and that the algorithm should terminate in a number of iterations of the 
same order as the number of relevant data points. In addition, in the case that 
the data set can be extended, the optimization formulation of the UQ objectives 
provides a natural notion of best next experiment (and hence maximally informative 
data set): it is the experiment that would induce the greatest change in the extreme 
value of the UQ optimization problem. 

The methods and results of this paper are predicated upon having suitable in- 
formation (or making assumptions) about the system of interest. As noted by 
Hoeffding [9], assumptions about the system of interest play a central and sensi- 
tive role in any statistical decision problem, even though the assumptions are often 
only approximations of reality. To illustrate the effect of information/assumptions, 
consider the following toy problem, which will be considered in further detail in 
Example 4.2 and treated numerically in Subsection 7.2: 



OPTIMAL UQ FOR LEGACY DATA OBSERVATIONS OF LIPSCHITZ FUNCTIONS 3 




(a) Surface plot. (b) Contour plot. 

Figure 1.1. Plots of P, the least upper bound on V[G{X) < 0] 
given that G: [0,1] M. has Lipschitz constant 1, mean i, and 
has {z,G{z)) on its graph, as a function of {z,G{z)) £ [0, i] x M. 
Note the discontinuity and non-monotonicity of P as a function of 
(z,G(z)). 

Example 1.1. Suppose that a measurable function G: [0, 1] — > M is applied to a 
random variable X with unknown distribution on [0, 1], and the event [G{X) < 0] is 
considered to constitute "failure". Given the values of G on some subset O C [0, 1], 
can the failure probability P[G(X) < 0] be optimally bounded from above? {N.B.: 
the points of O may be unrelated to the distribution of X, and so classical methods 
of statistical reasoning using the sample set {G{z) \ z e O} are inapplicable.) 

With this information alone, the only rigorous upper bound that can be given is 
the trivial one: P[G{X) < 0] < 1. Consider now the impact of two further pieces 
of information: 

(I) G is Lipschitz continuous with Lipschitz constant 1, or short, i.e. 

\G{x) - G{x')\ <\x~ x'\ for all x, x' G [0, 1], 

and hence G is continuous on [0, 1], and by Rademacher's theorem is dif- 
ferentiable with |G'(a;)| < 1 for Lebesgue-almost-every x G [0, 1]; 
(II) some information about the distribution of X on [0, 1] or the distribution 
of G{X) on M, e.g. that K[G{X)] > m for some known to. 

The first item of information does not generally provide any improvement on the 
trivial upper bound, since although it constrains the set of points x € [0, 1] for which 
it is possible that G{x) < 0, it says nothing about the P-measure of that set, unless it 
is found to be empty. However, taken together, G\o and the two additional items 
of information do provide non-trivial bounds on V[G{X) < 0]. Evaluating these 
bounds is an infinite-dimensional but well-posed optimization problem, which can 
be reduced to an equivalent finite-dimensional problem by the reduction theorems 
of [27]. Indeed, as will be shown later, if O consists of one point — i.e. we know 
one point {z,G{z)) that lies on the graph of G — then the least upper bound on 
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¥[G{X) < 0] can be given in closed form. This bound is given in (4.14), and surface 
and contour plots are given in Figure 1.1. Notably, the bound is both non- monotone 
and discontinuous with respect to the data point (z,G(z)). 

1.2. Outline. Section 2 establishes the notation and set-up of the problems of 
interest, and recalls a theorem of McShane [24] that will be useful later on. 

Section 3 treats the determination of optimal upper bounds on McDiarmid di- 
ameters (i.e. L°° semi-norms on component-wise oscillations of a function of several 
independent inputs) using legacy data and Lipschitz constants. Such upper bounds 
can be used, together with McDiarmid's inequality and the mean performance of 
the system, to provide rigorous upper bounds on the system's probability of failure. 

Section 4 treats the problem of directly and optimally bounding the probability 
of failure, i.e. finding the least upper bound that is consistent with the legacy 
data, the Lipschitz constants, and the specified mean performance. This problem 
is harder to solve than the problem of Section 3, but is still tractable, and has the 
advantage that it provides the optimal bound on the probability of failure given all 
the available information, whereas McDiarmid's inequality is non-optimal. 

Section 5 discusses necessary and sufficient conditions for a data point to be 
relevant to the solution of the problems in Sections 3 and 4; put another way, this 
section concerns the identification of redundant information. 

Section 6 contains some general remarks applicable to both Sections 3 and 4. 

Section 7 gives the results of some example numerical implementations of the 
problems of Section 4. In this section, we see that many data points may be 
redundant in the sense of Section 5, and hence that optimal UQ for systems with 
large legacy data sets may be given by considering well-chosen small subsets of the 
larger data set. 

Section 8 outlines some directions for generalization and future work. 



2.1. Notation. Let {Xk,dk) be a metric space for each fee {1, . . . ,K}; prototyp- 
ically, Afe = M or [afe,6fe] C K with the Euclidean distance dk{x,y) :— \x — y\. 
Let X := Xi X ■ ■ ■ X Xk- Let G: A" — > M be some function and suppose that, for 
each k G {1, . . . , K}, is a global Lipschitz constant for G with respect to its k^^ 
argument: i.e., 

{x,x' e X,x^ = x'^ for j^k) =^ \G{x) - G{x')\ < Lkdk{x^ .x""). (2.1) 
Define a quasi- metric ditA'xA'— >Rby 



If all Lk are strictly positive, then is a metric. In the prototypical case, is a 
rescaling of the "Manhattan" metric on R^. 

Lemma 2.1. A function f : X 9. is Lipschitz with Lipschitz constant Lk in its 
k^^ argument if, and only if, it is short with respect to the metric d^, i.e. 



2. Review and Notation 



K 




(2.2) 



fc=i 



\f{x)~f{x')\ < dL{x,x') for all x,x' e X. 



(2.3) 
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Proof. Suppose that / is short with respect to di, and let k G {1, . . . , K}. Let 
x,x' ^ X differ only in their k*"^ component. Then 

K 

\f{x) - f{x')\ <Y,L,d,{x= ,x'^) = LA(x\x"^), 

and so / is Lipschitz with Lipschitz constant in its A;*^ argument. Conversely, 
suppose that / is Lipschitz with Lipschitz constant Lk in its fc"^ argument, and let 
x' G X . Then 

\f{x)-f[x')\<\f{x)-J{x'\x\...,x^)\ 

+ \f{x'\x^...,x^)-f{x'\x'\x\...,x^)\ 
+ --- + \f{x'\...,x'^~\x'')~f{x')\ 

< Li\x^ - x'^\ + ■ ■ ■ + Lk\x'^ ^ x"^\ 

= dL{x,x'), 

and so / is short with respect to o?l. □ 

V{X) denotes the set of all Borel probability measures on X. The product of 
probability measures /x^ G 'P{Xk) for fc e {1, . . . , K} will be denoted /ii ■ • • (g) 
or (^fL]^ /ife; the set of all such measures will be denoted (S'fcLi 'Pi^k)- Recall that 
ii X ~ {Xi, . . . , Xk) is an A'- valued random variable with law fj,, then saying that 
the K components of X are independent is the same as saying that ^ is a product 
measure (^^^ Hk, where /i^ is the law of Xk on X^- 

For /: A" M, let Pfc[/] be the fc*'' McDiarmid subdiameter of / on X: 

Vkif] sup{|/(x) - f{x')\ \x,x' e X,x^ = x'^ for j ^ k} . (2.4) 

Vk [f] is a global sensitivity index that measures the sensitivity of / to changes in 
its k^^ argument. The McDiarmid diameter T>[f] of / on A" is defined by 

/ K \ 1/2 

V[f]:=\Y,Vu[ff\ . (2.5) 

Each T>k[-] (and, indeed, T>[-]) is a semi-norm on the space of bounded real-valued 
functions on X\ any / that is constant in its fc**^ argument has = 0. 

Clearly, if /: — R is known to be d^-short, then this information provides a 
(not necessarily sharp) upper bound on the McDiarmid diameter of /: 

Vk[f]<Lkdi&Tn{Xk,dk):^ Lk sup 4(x^x"=). (2.6) 

The McDiarmid diameter is useful because it places an upper bound on devi- 
ations of f{X) from its mean value whenever X is an X-vahied random variable 
with independent components, as the following result shows: 

Theorem 2.2 (McDiarmid's inequality [18, 19, 20]). Let (1^, J',P) be a probability 
space and, for k G {1, . . . , K}, let Xk : — >■ Afe be independent random variables. 
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Suppose that E[|/(X)|] is finite. Then, for any r > 0, 



The independence assumption in McDiarmid's inequality can be relaxed and 
replaced with some control on the martingale differences E[/(X)| J^i+i] — E[/(X)|J^i] 
of f{X) with respect to a suitable filtration J", of the probability space {il,J-,V). 
Also, the mean and McDiarmid subdiametcrs can be used as inputs for sharper 
inequalities such as the optimal McDiarmid inequality of [ ]. 

Given a measurable system of interest G : A" — > M, let e M denote a possi- 
ble value of G that is considered to be a failure threshold: the event [G{X) < 6] 
represents the failure of the system G, and the complementary event [G{X) > 6] 
represents the success of the system G. Under the assumption that the random in- 
puts of G {i.e. the coordinate processes Xi, . . . , Xk) are independent, McDiarmid's 
inequality implies that the probability of failure is bounded as follows: 



where, for t € M, i+ :— ma.x{0,t}. The inequality (2.9) can be rearranged in order 
to provide rigorous certification criteria for computational and physical systems of 
interest, subject to the determination of the mean system performance E[G(X)] 
and the McDiarmid diameter X'[G]; see e.g. [17, 13, 1]. Namely, if G [0,1] is 
the greatest probability of failure that can be accepted if the system is to be called 
safe, and it is known that E[G(X)] > m and X'[G] < D, then a sufficient condition 
for the safety of the system is the validity of the inequality 



2.2. UQ Problem Formulation. Suppose that the values of G are known only 
on some observation set OCX- that is, the restriction G\o of G to O is known 
exactly. In applications, it is usually the case that O is a finite collection of points 
O = {zi, . . . , zjv} Q Suppose also that constants Li, . . . , > are given such 
that G is known to be d^-short. The main questions that this paper addresses are 
the following: 

(1) Section 3 shows how to use the observations G\o and the Lipschitz constants 
L — (Li, . . . , Lk) to provide an optimal upper bound D on the McDiarmid 
diameter 2?[G]. 

(2) Section 4 shows how to use the data G\o, the constants L and the mean per- 
formance E[G(X)] to provide an optimal upper bound P on the probability 
of failure ¥[G{X) < 9]. 

(3) Section 5 considers the problem of determining which observations z € O 
are relevant to the solutions of the problems in the previous two sections. 
Furthermore, one can consider the dual problem: if the data set G\a could 
be extended, at what points of the input parameter space X should G 
be evaluated to gain maximally relevant information that will improve the 
bounds D and P7 




(2.7) 



(2.8) 




(2.9) 




(2.10) 
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Remark 2.3 (Other UQ problems). Although the exposition of this paper treats 
the certification problem of bounding P[G(X) < 9], there are many other uncer- 
tainty quantification problems — e.g. verification, validation, and prediction [ ] 
— to which this paper's methods are applicable. For example, G above may actu- 
ally stand for the difference between some physical system, H, and a model for that 
system, F; if the aim is to predict values of H using the simulation F with quanti- 
fied error bounds, then this is tantamount to showing that P[||_ff(X) — F(X)|| > 6] 
is suitably small, where || • || is some "error norm" on (a subset of) parameter space 
X. This certification-centric point of view is similar to that of [4], in which many 
reliability problems are placed in a unified framework, and that of [27]. 

Remark 2.4 (Other regularity conditions). In many practical applications, of 
course, the response function G is not known to be globally Lipschitz. In this paper 
attention is confined to the globally Lipschitz case as a representative example of 
a broad class of possible constraints. For example, it may be more appropriate to 
consider a Holder- type constraint, which would correspond to an inequality of the 
form 



or a local Lipschitz constraint, which would correspond to an inequality of the form 



The example of Subsection 7.3 will use just such a modified Lipschitz constraint, 
one suited to possibly discontinuous or multivalued functions. The minimum re- 
quirement on any proposed system of inequalities to constrain G is that the desired 
inequalities should hold whenever x and x' are elements of the observation set O, 
and that the inequalities must constrain the values of G pointwise. So, for example, 
a constraint on the Sobolev H^'^'P(IR'*) norm of a fmiction G: R'' — > M would not be 
a suitable constraint if kp < d. 

Remark 2.5 (Other types of observation). In this paper the observations of G 
are pointwise evaluations of G at finitely many points of its domain. One could 
also consider more general observation operators, e.g. a continuous linear functional 
A: W''-'P{R^) — > M, or a collection of such operators. 

2.3. Extension of Partially-Defined Functions. In what follows, in order to 
show that the upper bounds that are obtained are in fact the optimal upper bounds 
given the available information, it will be necessary to invoke the following extension 
theorem from metric space theory, which states that a real- valued Lipschitz function 
defined on any subset of a metric space can always be extended to the whole space 
without increasing the Lipschitz constant: 

Theorem 2.6 (McShane's extension theorem [ ]). Let {Ai,p) be a metric space, 
let E C M, and let G >0. If f: E satisfies 



\G{x)-Gix')\<dLix,xT; 



(2.11) 




(2.12) 



|/(a;) - f{x')\ < Gp{x,x') for all x,x' e E, 
then there exists /: — > M such that f\E — f md 

|/(x) - /(a;')| < Cp{x,x') for all x,x' G M. 
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McShane's theorem also applies to the extension of Holder continuous real- valued 
functions defined on a subset of a metric space; any continuous real- valued function 
with concave modulus of continuity can be extended to the whole space while 
preserving the modulus of continuity. 

In the language of metric space theory, McShane's extension theorem says that 
the Euclidean line (R, | • |) is an injective metric space [1 1]. The extension of vec- 
tor-vahied Lipschitz functions is a subtle topic; see e.g. the Kirszbraun- Valentine 
theorem [14, 3-^], which states that Lipschitz functions between Hilbert spaces can 
always be extended without increasing the Lipschitz constant, which is not generally 
true even for Lipschitz functions between finite-dimensional Banach spaces p. 
202] . It is for this reason that this paper considers only scalar- valued performance 
measures G. 

3. Optimal Bounds on McDiarmid Diameters 

For each fee {1, . . . , K}, an upper bound on the McDiarmid subdiameter 2?fc[G] 
can be obtained by an optimization problem. First observe that P/ciG] is the 
maximum value of the function 

X • • • X X Xk X • • • X Xfc — !" R 

defined by 

(diff fe G)ix\...,x''-^\x'',x"',x''+\..., x^) 

(Indeed, I'fe[G] is also the negative of the minimum value of difffe G.) Therefore, 
an upper bound on I?/£[G] consistent with the observations G\o and the Lipschitz 
constant L ~ {Li, . . . ,Lk) is given by the solution of the following optimization 
problem in the K + 3 variables , . . . , x^ , x'^ , y, y': 

maximize: \y~y'\'i 
among: (x, y) G A" x K, 

{x',y') e A" X M; 
subject to: = x'' for alH G {1, . . . , if} \ {A:}: 

\y-y'\<Ludu{x\x'^)- 

for all z G O: 

\y~G{z)\<dL{x,z), 
W -G{z)\<dL{x',z). 

Note that (3.1) is not a linear programming problem: the feasible set for {x, y) and 
{x' , y') is an intersection of double cones in <Y x R, as illustrated in Figure 3.1. Note 
also that (3.1) is not a cone program in the sense of [~, §4.6.1]; that term refers 
instead to the minimization of a linear objective function over a closed convex cone 
that contains no lines and has non-empty interior. 

A point {x,y) G A x E such that \y — G{z)\ < di(x, z) is said to be feasible with 
respect to the data point (z, G(z)); if this holds for all z G O, then (x, y) is said to 
be G\o -feasible. 

Let Dk[X, G\o, rfi] (or simply Dk) denote the upper bound on 2?fc[G] that arises 
as the solution (extreme value) of the optimization problem (3.1). It is natural to 
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Figure 3.1. Shaded, the feasible set for (x, y) and {x',y') inXxR 
for the optimizatfon problem (3.1) given the four observations rep- 
resented by the four black dots. The white dots show the optimal 
values for {x, y) and (x' , y'). Note that, as well as being constrained 
to lie in the shaded feasible set, {x, y) and {x' , y') must also satisfy 
\y~v'\ < dL{x,x') = Lkdk{x'',x"'). 



ask whether or not Dk is the least upper bound on Pfc[G'] given G\o and L. In fact, 
this is the case, and the proof relies on McShane's extension theorem. 

Theorem 3.1 (Optimality of Dk). Let 

£{X,G\o,dL) := {g: X ^R\g is dL-short and g = G on O} (3.2) 

denote the set of all functions on X that have Lipschitz constant L and interpolate 
the given values of G on O. Then the maximum value Dk of (3.1) is the optimal 
upper bound on V given G\o and L in the sense that 

Dk = sup{2?fe[5] I g e £iX,G\o,dL)}. (3.3) 

Proof. Let S denote the supremum on the right-hand side of (3.3). Suppose that 
Dk > S. Then there exist points {x,y) and {x',y') € X x R that satisfy the 
constraints in (3.1) and the inequality 

S<\y-y'\<Dk. 

Define g: OU {x,x'} ^Rhy 

g{z) G{z) for each z e 0, 
9i.x) y, 
9{x') y'. 

This g is c?L-short, and so McShane's extension theorem implies that g can be 
extended to some ^L-short g: X ^M.. Necessarily, g\o — g\o — G\o- However, by 
construction, 'Dk[g\ > |y — > S, which contradicts the definition of S. Hence, by 
contradiction, Dk < S. 

Now suppose that Dk < S. Then there exists some d^-short g: X ^ M. such 
that g = G on O and "Dklg] > Dk; hence, there exist points x,x' ^ X that differ 
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only in their k component and such that 

Dk < \g{x)-g{x')\<Vk[g]. 

However, x and x' with y := g{x) and y' := g{x') satisfy the constraints in (3.1), and 
so Dk > \g{x) — g{x')\, which is a contradiction. Hence, Dk > S, which completes 
the proof. □ 

Remark 3.2. It is important to note that although Dk is the optimal upper bound 
on Vk[G] given G\o and L, and hence D {Df + ■ ■ ■ + D^^Y'^ > V[Gl it 
is not generally true that ID is the optimal upper bound on I?[G] given the same 
information (Gjo and L). The reason for this is that the (approximate) maximizers 
for, say, Di and Dk may not be mutually consistent, i.e. di-short. 
Note also that the upper bound 

F G(X) <9]< cxp — 1 ^ ^ ^ ^+ 

is not the optimal upper bound on F[G{X) < 6] given E[G(X)] and that Vk[G] < 
Dk. The optimal such bound is given by the optimal McDiarmid inequality of [27, 
§4]. 

3.1. Error Bounds. In addition to the question of optimality, it is natural to ask 
whether or not solutions Dk of (3.1) converge to the McDiarmid subdiameter I?a;[G] 
as the number of observations increases to infinity. Unsurprisingly, the important 
quantity is not the number of observations, but rather the largest gap between 
them, as measured by the metric cIl. Define the gap size of the observation set O 
on X to be the (asymmetric) Hausdorff distance from X to O with respect to , 
i.e. 

T{X, O, cLl) ■= sup dL{x, O) := sup inf dL{x, z). (3.4) 
xex xex ^<^^ 

Theorem 3.3 (Error bound for Dk). For any G: X ^ with finite McDiarmid 
.subdiameter T>k[G] and any O Q X , 

Q<Dk-'Dk[G]<AV{X,0,dL). (3.5) 

Proof. Theorem 3.1 shows that Dk > 2?a;[G], so it remains to show the effective 
"4r" part of the error estimate. 

Let e > be arbitrary. Let {x,y) and (x',y') £ X xM. satisfy the constraints in 
(3.1) and be e-approximate maximizers for that problem, i.e. 

Dk-e<\y~y'\<bk. 

Then, even though the values G{x) and G{x') may be unknown since x and x' are 
not necessarily members of O, the following estimate holds: 

Dk <\y~y'\+e 

<\y~ G{x)\ + |G(.t) - G{x')\ + \G{x') ~y'\+s 
<\y- G(x)\ + Vk[G] + \G{x') - y'\ + e. 
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By the definition of tlie gap size, there exists some z £ O such that di(a;, z) < 
r{X,0,dL), and so 

\y-Gix)\ < \y-G{z)\ + \G{z)~Gix)\ 

< dL{x, z) + dL{z, x) 

< 2T{X,0,dL). 

Similarly, there exists z' G O such that dL{x',z') < r{X,0,dL), and so \G{x') — 
y'\ < 2g. Therefore, Dk < e + 4T{X, O, di) + 23fc[G] and, since e > was arbitrary, 
the claim follows. □ 

3.2. Structure of the Feasible Set. The constrained optimization problem (3.1) 
entails exploration of the feasible set £{X,G\o,dL) of di-short extensions of the 
data G\o to all of X: 

£{X,G\o,dL) -.^ {g: X ^R\g is dL-short and .g = G on O}. 

Note that £{X,G\o,dL) not a linear space, but is a convex subset of the linear 
space of all real-valued functions on X. Furthermore, by the Arzela-Ascoli theorem, 
f (A", G|c), (ii) is complete with respect to the uniform (supremum) norm, and is 
compact whenever X is compact. McShane's extension theorem (Theorem 2.6) is 
the assertion that, whenever G\o has Lipschitz constant L on O, £{X,G\o,dL) 
is non-empty. Theorem 3.1 states that the maximum value of g over 
g € £{X,G\o,dL) can be found by restricting attention to a finite-dimensional 
subset as described by the constraints in (3.1). Indeed, this search can be made 
even simpler than (3.1) suggests by considering structure of the problem for y and 
y' with fixed x and x' . 

For fixed x € X, define the least and greatest feasible values of g{x) among 
ge£{X,G\o,dL) by 

Y~ {x,G\a, L) ■— sup g{x) ~ snp Giz) — dL{x, z), 

ge£iX,G\o-dL) zeo 

Y+{x,G\o,L) inf g{x) ^ M G{z) + dL{x, z), 

ge£{X,G\o-dL) zee 

Note that these quantities are easily calculated when O is a finite set. The pair 
{ViV') £ is feasible {i.e. y = g{x) and y' = g{x') for some x, x' G A" that differ 
only in their fc*^ component) if, and only if, 

ye [y-(a;,G|o,i),y+(a;,G|o,i)], 
V' e [y-(a;',G|o,L),r+(x',G|o,i)], and 
\y-y'\ < dL{x,x') = Lfedfe(a;^a;"=). 

So, for each (x, x'), the set of feasible (y, y') is a closed and convex polygon in . 
The maximum value of \y — y'\ over this polygon is A{x, x'), defined by 

{dL{x,x'), 
Y+{x,G\o,L) - Y-{x',G\o,L), 
Y+{x',G\o,L)-Y-{x,G\o,L) 

The constrained optimization problem (3.1) is, therefore, equivalent to the follow- 
ing unconstrained (and, therefore, more easily solved) problem in -I- 1 variables 
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™1 ^'fc . 



maximize: 

among: x ^ X 



(3.6) 



a; 



3.3. Examples. As a simple example that can be solved explicitly, consider an 
affine function G: X [0, 1]^ ^ R: 

K 

G(x) = ao + ^ afea;'^ (3.7) 
fc=i 

for some constants aq, ai, . . . , € R. Suppose that the observation set O consists 
of a A^i X • • • X Nk rectangular grid of equally-spaced points of [0, l]'^, with observa- 
tions at the corners of the cube. Given > |afe|, the gap size for this observation 
set is 

(3.8) 

The exact McDiarmid subdiameters of G satisfy 'Dk\G\ — \ak\. On the other hand, 
Dfe, the least upper bound on 2?fc[G] given the observations G\o and the Lipschitz 
constants Li, . . . , Lk but not the information that G is affine, "'^ is given by 

= + (3-9) 

i—\ 

In this case, the error — I'fe[G'] is approximately half the upper bound given by 
Theorem 3.3 if ^ \ak\, and vanishes if = |afc|. 

4. Optimal Bounds on Probabilities 

In this section, in the spirit of [5, 27], the emphasis is on providing optimal bounds 
on the probability of failure P[G(X) < Q\ rather than bounds on the McDiarmid 
diameter 'D\G\. Theorem 3.1 shows that the optimization problem (3.1) determines 
the optimal upper bound on each McDiarmid subdiameter I?fe[G], and hence — 
given that E[G(X)] > m and via McDiarmid's inequality (2.9) — an upper bound 
on the probability of failure P[G(X) < 6\. However, this bound is not necessarily 
the sharpest one given the available information, namely that G is d^-short, its 
inputs are independent, and that G\o and E[G(X)] are as given. The optimal 
upper bound on the probability of failure given this information is denoted by 
P\X, G\otL, to] (or simply P) and is given by 

P:= sup /i[5<^], (4.1) 

where 

g: A" — > R is di-short. 



A:^ { (ff,/i) 



^^ii®---®m^®k=i'Pi^k), ), (4.2) 
5 = G on O, and E^[g] > m 



^If G is known to be affine and its values are given at + 1 points in general position in 
[0, 1]^, then G is determined everywhere. 
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g e £(X, G\o,dL), and E^[5] > m 
This infinite-dimensional optimization problem over coupled g € £{X,G\o,dL) 
and n G V{X) is more numerically tractable that it may seem. The next subsec- 
tion shows that, for each g, the extreme values can be found by searching only 
among measures fj, that have a particularly simple structure; furthermore, this sim- 
ple structure simplifies the search over g as well. 

4.1. Finite-Dimensional Reduction Theorem. Given two points xo,xi G X, 
let C{xq,xi) denote the discrete cube in X that has xq and xi as its "opposite 
corners" : 

C(xo,xi) := {x, := ex\ee{0, 1}^} . (4.3) 

The elements of C{xo, xi) are indexed by the elements of the Hamming cube {0, 1}^: 
for e e {0, 1}^, x^ e C{xo, xi) is the point whose k^^ component is the same as the 
k^^ component of xq if Sk — 0, and the same as the A:*^ component of xi if Sk = 1. 

Recall that a topological space Z is said to be a Radon space if it is separable 
and every Borel probability measure on Z is inner regular [30, 36]; that is, Z is a 
Radon space if it has a countable dense subset and, for every /i S 'P{Z) and every 
Borel set B C Z, 

n{B) = sup{/i(/f) \ K (= B and K is compact}. (4.4) 

In particular, any continuous image of a separable and completely metrizable space 
(a Suslin space) is a Radon space. Compact subsets of M" fall into this category. 

Under the mild technical assumption that each {Xk,dk) is a Radon space, the 
reduction theorems of [ ] imply that, for each c?L-short g: X ^ R, the extreme 
value in (4.1) is obtained among product probability measures fi such that each 
marginal distribution fik has support on at most two points of Xk — i.e. fik is a 
convex combination of at most two Dirac measures (point masses). That is, it is 
sufficient to search over probability measures of the form 

K K 



fe=i k=i 



(pk6,u+{l-pk)5^K^ (4.5) 



that are supported on C{xq^xi) for some xq,xi £ X] xq, xi and p are parameters 
with respect to which we must optimize. 

It is a simple matter of combinatorics to convert the product representation (4.5) 
into the sum representation 

using the indexing scheme (4.3). If /i is any such measure and r is any real-valued 
measurable function defined on any superset of C(a;o,xi), then Ep[r] exists and 
depends only upon the points Xe, the values :— g{xe) and the weights pk. The 
sum representation (4.6) makes the calculation of Ep[r] very easy: 

^A^W= E [f[iPk)'-'Hl-Pk)Arix,). (4.7) 

ee{0,l}*^' \A:=1 / 
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In particular, given g: A" — s> R, the mean and probability of failure for g are easily 
calculated using (4.7) with r = g and r — l.[g <9] respectively. 

As the following theorem shows, a search over the finite-dimensional collection 
of feasible xq, xi, {y^ | e G {0,1}^} and p G [0, 1]^ has the same extreme values 
as the infinite-dimensional problem (4.1)-(4.2), where "feasible" means being d^- 
short, extending G\o^ and having the right mean value: 

Theorem 4.1 (Optimality/finite-dimensional reduction). Suppose that (X^, dk) is 
a Radon space for each k G {1, . . . , K}. Let A be given by (4.2) and let 

for some Xq,xi G X , 
g : C{xq, xi) U O ^ M is dL-short, 

M = (8)f=i f^k e riC{xo, X,)) n {g)f^i v{Xk), 

g = G on O, and ^^[17] > m 



Aa 



(4.8) 



Then 



K 

fe=i 



dim(^A) - 2 V dim(A'fc) + 2^^ + K, 



sup p\g < 

inf p.[g < 
(9,A')e-A 



sup p,[g < 9], 

inf fi[g<e]. 

(s,A')e-4A 



(4.9) 
(4.10) 
(4.11) 



Proof. Assertion (4.9) follows from the fact that an element of .4a is determined 
by a choice of xq e X , xi e X , p £ [0, 1]^ as in (4.5) or (4.6), and a choice of g{x) 
for each of the 2^ points of C(xq, xi). 

To prove (4.10), let S := sup(g^)g_4 < 6]. Then 



5" = sup < fi[g < 



< sup { fi[g < 



for some xo,xi G X, 
g: X is dx-short, 

(8)f=i Mfc e ncixo, xi)) n P{Xk), 

g = G on O, and > to 

for some xq,xi G A", 
g: C{xo,xi) U O — > M is d^-short, 

g = G on O and E^[(7] > to 



= sup ^[g < 6]. 

(s,m)6-4a 

The first equality follows from the reduction theorem ['^", Theorem 3.1 and Corol- 
lary 3.4] and the inequality follows from the fact that only the values of g on the 
discrete cube C(a;o,a:;i) are germane to the probability of failure and the mean 
constraint; the final equality holds true by definition of the right-hand side. 

To see that this inequality must, in fact, be an equality, suppose for a con- 
tradiction that S < sup(g /i[(7 < 9]. Then there exist some xo,xi G X, 
p G [0, 1]^ and a di-short g: C{xo,Xi) U C M such that g = G on O, £^[5] > to 
and fi[g < 9] > S. By McShane's extension theorem, there exists an extension 
of 5 to a c?i-short function g: X ^ M.; necessarily, this extension has g — G on 
O, E^[g] = £^[5] > TO and fj,[g < 9] = fi[g < 9] > S, i.e. {fi,g) G A. Hence, 
S < < 9] < S, which is a contradiction. 

This establishes (4.10); the proof of (4.11) is similar, and is omitted. □ 
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Figure 4.1. A schematic illustration of the variables in the opti- 
mization problem (4.12). The black dots show the fixed locations 
of the legacy observations G\o- The grey dots show the movable 
locations of the 2^ support points x^, e G {0, 1}^, of the discrete 
product measure n on X. The white dots show some feasible val- 
ues {xg^Hi,). The marginal distribution /ij, on X}^ assigns mass pk 
to Xq and mass l—pk to x^^; the mass of x^ is determined by (4.6). 



Theorem 4.1 shows that the infinite-dimensional optimization problem (4.1) is 
equivalent to {i.e. has the same extreme value as) the following finite-dimensional 
optimization problem, where now is written in place of g{xe)'. 



maximize: 



among: 



subject to: 



(4.12) 



££{0,1}^ \k=l J 

xo, xi e X, 

y. {0,1}^' ^M, 
[0,1]^; 

for all e,e' € {0, 1}^, £ =^ e': 

\ye - ye' I < dL{Xe,Xe'); 

for all e G {0, l}^,z € O: 
\ye - G{z)\ < dL{xe,z); 

E (nbfc)''"'(l-Pfcr ) >m. 

£€{0,1}^ \fc=l / 



See Figure 4.1 for a schematic illustration of the problem (4.12). The problem 
(4.12) has high dimension: assuming that dim{Xk) = 1 for each k, (4.12) is a 
problem in 3K + 2^ unknowns with 2-^^^(2-'^ ~ 1) + \0\2^ + 1 distinct constraints. 
However, as will be seen in Section 5, many of these constraints are redundant or 
non-binding. Furthermore, we have numerical evidence that in some cases not all 
of the 2^ support points of the measure n need to be considered: see the remarks 
in Section 7 about "dimensional collapse" and Figure 7.3. 
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sup <^ /i((-(X),0]) 



4.2. Error Estimate. An error estimate for the maximization problem (4.12) is 
naturally provided by solving the corresponding minimization problem. That is, 
the double inequality 

inf ^i[g <e]< P[G{X) < 6] < sup ^i[g < 9] 

is ipso facto the sharpest such inequality on the probability of failure given the 
available information {G\o, L and E[G(X)] > m); hence, the error estimate for P 
is simply 

P - P[G(X) <6]< sup ^i[g <6]- inf ^i[g < 6], (4.13) 
and this inequality is sharp. 

4.3. Prototypical Example. The next example, Example 4.2, in which (4.12) 
is solved explicitly for one observation of a function on the unit interval, illus- 
trates two very important points: the least upper bound on the probability of 
failure, P[X, G\o, L, to], can depend discontinuously and non-monotonically on the 
observed data G\o- It may be useful to first observe that 

/i G V{R), Er^^[F] > TO, 
/i supported on an interval of length < R 

fi ^ pSy, + {1 - p)5y, ePiR), 
yo,yi eR,p£ [0,1], 

pyo + (1 - p)yi > m, 
jyo - 2/i| < R 

~R / + 

and that the maximizer satisfies yo = 0, j/i = R. The heuristic to bear in mind 
is that the event [j/o = 0] can be assigned high probability if the value j/i can be 
chosen to be sufficiently greater than the prescribed mean to. 

Example 4.2. Suppose that a function G : [0, 1] — > M with Lipschitz constant L > 
is observed at a single point, i.e. O = {z} for some z S [0, 1]. By symmetry, it is 
enough to consider the case that z S [0, i]; for simplicity, suppose that G{z) > 0; 
also, it is no loss of generality to set the failure threshold to be 6* := 0. 

Suppose it is known that E[G'(X)] > m G R; necessarily, it must hold that 
\G{z) — to] < L\l — z\, otherwise the data and the mean and Lipschitz constraints 
are mutually contradictory. The least upper bound P on P[G'(X) < 0] given the 
observation (z, G(z)), that E[G(X)] > to, and the Lipschitz constant L, is given in 
five cases: 



sup < 



M(-oo,o]) 



p = < 



L-{Lz-G{z)) 
L-(Lz+G(z)) 
L+(G{z)-Lz) 



Lz+G(z) 



0, 



if G{z) < Lz, 

if Lz<G{z) <L\\-z\, 

if L|i -z\< G{z) < L\l - Sz\ 

if G{z) > L max{z, 1 — 3z}, 

if G(z) >ill-zl. 



(4.14) 



The five cases are shown in Figure 4.2; surface and contour plots of P as a function 
of the observed data {z,G{z)) were given in the introduction in Figure 1.1. Note 
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(d) {z,G{z)) = (1,1), (e) iz,G(z)) = (|,|), 

and P = 1 and P = 

Figure 4.2. Illustration of the maximizers in Example 4.2 with 
L — 1. The dotted lines show the boundaries of the various cases 
in {z,G{z)) data space. The black dot shows the data point, and 
the white dots the positions of {xo,yo) and {xi,yi) that maximize 
the probability of failure; in each case, P = (l — \yT-yo\ ) + ' ^'-'^^ 
that failure is impossible in case (e). 

well that P is neither continuous nor monotone with respect to {z,G{z)): the 
boundaries among the five cases define "critical lines" in data space, across which 
there are stark changes in the conclusions that may be inferred from the observed 
data. Note also that the maximizers for (4.12) may be non- unique: e.g. in Figure 
4.2(a), which corresponds to the first case in (4.14), the maximum is attained by 
any (a;o,2/o), (a;i,yi) and p satisfying 

xo e [0,z - G{z)/L], 
Xl — 1, 

P= (l-r^^) 

^ V \yi-va\ J ^ 

There is a similar lack of uniqueness in Figure 4.2(d). On the other hand, the 
maximizers in Figures 4.2(b) and (c) are unique. 

Note that, for any single observation (z,G(z)), the least upper bound on the 
McDiarmid diameter, I?[G'], is simply L, and that the bound (4.14) is in each case 



yo = 0, 

yi=L-Lz + G{z), 
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= exp {-2m\/L^) 
\ §4] 

5. Redundant and Non-Binding Observations 

In many applications, the aim is not to understand the behaviour of G on the 
whole of parameter space X, but only on some subset C A", or on the elements of 
a partition X — of ^ [■^•^]- The observation set O may lie entirely within V, 

or only partially lie within V, or lie entirely outside V. Heuristically, it seems rea- 
sonable that the points of O that are "nearest" to V should be the most important 
ones, but it is not immediately obvious what "nearest" means. The optimization 
problems in this paper answer have two natural notions of relevancy for data points: 
the relevant data are the ones that correspond to non-trivial constraints, or rather, 
change the extreme value of the optimization problem. 

Even if the aim is to understand the behaviour of G on all of X, the problems 
(3.1) and (4.1)-(4.2) are highly constrained, and their solution is much simplified 
by elimination of redundant constraints/observations. To that end, this section 
discusses two notions of redundancy /relevancy for data points [ ]: 

• redundant data points do not change the feasible set in the problems (3.1) 
and (4.1)-(4.2); 

• non-binding data points may change the feasible set in the problems (3.1) 
and (4.1)-(4.2), and may even change the extremizer, but do not change 
the extreme value. 

Clearly, every redundant data point is non-binding, but not vice versa. With this 
point of view, the problem of finding "nearest data points" becomes one of finding 
minimal data sets O that are redundancy-free. 

5.1. Redundant Lipschitz Constraints. In problem (4.12), many of the 2^^ 
Lipschitz constraints of the form 

IVe -Ve'l < dL{Xe,X^') (5.1) 

are redundant constraints. First, (5.1) is obviously satisfied when s — e' , so there 
are at most 2^^ — 2^ — 2^(2^ — 1) non-redundant constraints of the form (5.1). 
Secondly, (5.1) is symmetric under interchange of e and e', and so there are at 
most 2^(2^ - l)/2 2^^'-i(2^ - 1) non-redundant constraints of the form (5.1); 
it suffices to endow {0, 1}^ with some total order ^ {e.g. lexicographic ordering) 
and only verify (5.1) for e -< e' . 

A third source of redundancy is neatly encapsulated in Lemma 2.1: in order to 
verify that (5.1) holds for all e.e' G {0, 1}^ (i.e. to show that (?|c(xo,a;i) is cii-short, 
where = ^(xe)), it is necessary and sufficient to check that (5.1) holds when e and 
e' differ in precisely one entry. Geometrically, this corresponds to checking (5.1) 
not between arbitrary vertices of the cube C(a;o,a;i) but only along edges joining 
adjacent vertices. There are K2^ such edges, and so symmetry considerations yield 
the following result: 



an improvement on both McDiarmid s inequality 

P[G(X) < 0] < exp [-2m\jD\Gf^ = 
and on the K = \ optimal McDiarmid inequality [ 

P[G(X) <0] < (l-^^ I = 
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R 




Figure 5.1. The observation at G A:" \ is redundant on V 
with respect to O := {zi,Z2}, since its feasible cone contains the 
set of all Gjo-feasible points in x M. Contrarily, the observation 
at z'q is relevant on V with respect to O. 



Theorem 5.1 (Relevant Lipschitz constraints). A constraint of the form (5.1) 
in problem (4.12) is relevant only if e < e' and Ek 7^ e^c /''^ precisely one k £ 
{1, . . . , K} ; otherwise, it is redundant. Hence, there are at most K2^~^ non- 
redundant constraints of the form (5.1). 

5.2. Redundant Data Points. Given V ^ X and OCA' such that G\o is known, 
an observation (zq, G{zq)) £ X x R is said to be redundant on V with respect to O 
if, for all {x, y) eV xR, 

\y-%f)\<^d%,z)] =^ \y-Gi^o)\<dUx,zo), (5.2) 

and say that it is relevant otherwise. That is, a redundant observation is one for 
which the induced constraint in (3.1) (or (4.1)-(4.2) or (4.12)) is automatically 
satisfied whenever the constraints induced by O are satisfied; put another way, the 
set of Gjo-feasible points in F x R is contained in the cone of G|{2(,}-feasible points 
in F X M. See Figure 5.1 for an illustration. 

Proposition 5.2 shows that every (non-isolated) data point z £ OOV is relevant; 
only data points z £ O \ V may be redundant. Furthermore, Theorem 5.3 shows 
that every point z £ O \ V that is sufficiently far away from V is redundant. 

Proposition 5.2 (Relevant data points). Let V C X , O C X , and G\o be given, 
and suppose that dL is a metric. If zq £ O CiV and zq is an isolated point of O, 
then Zq is relevant on V with respect to 0\ {zq}- 

Proof. Let z' be the closest point of O \ {20} to zq (if there is more than one such 
point, then choose any such point). Then any value 

y e [G(z') - dL{zo, z'),G{z') + dL(zo, z')] 

is feasible with respect to O \ {zo}- Since zq is an isolated point of O and d^ is a 
metric, this interval has non-zero length. However, the such y that is feasible with 
respect to O is G(zo). Hence, zq supplies a non-trivial constraint and is relevant on 
V with respect to O \ {zo}. (Note that if V is, say, a subset of with non-empty 
interior, then this argument can be applied on a neighbourhood of zq, thereby 
demonstrating relevancy of zo to a non-trivial set.) □ 
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The next result gives a sufficient condition for observations zq ^ O \ V to he 
redundant. Say that y £ X is between x £ X and z £ X ii 

dL{x,z) =dL{x,y) + dL{y,z), (5.3) 

and that y is between V C X and W C X ii (5.3) holds for every x £ V and 
z G W. Note well that in the prototypical case that c?l is the £^ Manhattan metric 
on R^, the set of points between x and z is not the Euclidean line segment joining 
them, but the closed convex hull co{C{x,z)), i.e. the compact cuboid with faces 
perpendicular to the coordinate axes and x and z as its opposite corners. 

Theorem 5.3 (Redundant data points). Let V C X , O C X , L and G\a be given. 
Fix Zq £ O \ V. Suppose that p £ X is between V and Zq, and that there exist 
z',z" e O nV satisfying 

G(z') + dLiz',p) < G(zo) + dLizo,p), (5.4) 
G(z") - dL{z",p) > G{zo) - dL(zo,p). (5.5) 

Then zq is redundant on V with respect to O DV . 

Proof. Let {x,y) £ V x M. he a, feasible point with respect to G\onv, i-S- 

\y - G{z)\ < dL{x, z) for each z£O^V, 

and suppose for a contradiction that \y — G{zq)\ > dL{x,zo) > 0. If y > G{zq), 
then the assumption ad absurdum implies that y > G{zo) + dL{x, zq). Hence, 

\y-G{z')\>y~G{z') 

> G{zq) + dL{x, Zq) - G{z') 

> dL{z' ,p) - dL{zQ,p) + dL{x,zo) by (5.4) 

= dL{z' ,p) + dL{x,p) since p is between V and zq 

> c?i(x, z') by the triangle inequality, 

which contradicts the feasibility of {x,y) with respect to G\or\V- Similarly, li y < 
G(zo), then (5.5) implies that 

\y-G{z")\>dL{x,z"), 

which is again a contradiction. This completes the proof. □ 

If the closure y of is a compact rectangular box IlfeLil'-'^'^' Z^*^] — then, 
for each zq £ 0\V ^ there is a natural choice for the point p with respect to which 
conditions (5.4) and (5.5) can be checked: the unique point Pz^.v G V that is closest 
to zoi where 

{a^, if x^ < a'', 
a;^ if < x'^- < /3^ (5.6) 
/3^ if a;'^ >/?'=. 

It is easy to see that Pzq.v is between V and zo- This choice of p validates the 
heuristic that observations far away from V ought to be redundant, since (5.4) and 
(5.5) are certain to hold when V is bounded and c?l(zo, V) is large enough. 
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5.3. Non-Binding Data Points. A more interesting notion of the information 
content of the data points {z,G{z)) is not relevancy but bindingness. Whereas 
redundancy concerns the set of feasible points for an optimization problem, a non- 
binding constraint (or data point) is one that perhaps changes the feasible set but 
does not change the extreme value of the problem. 

Given OCX such that G\o is known, an observation {zq,G{zo)) e X x M is 
said to be 

• non-binding for Dk with respect to O if 

Dk[X, G\ou[2o}, L] — Dk[X, Glo, di]; 

• non-binding for P with respect to O if 

P[X,G\o^{,^}.L,m]=P[X,G\o.L,m]. 

Otherwise, an observation is said to be binding. Note well that the inclusion of a 
binding observation strictly changes the extreme value of the optimization problems, 
not just the set of extremizers. 

Clearly, if including an observation at zq does not change the feasible set for, say, 
the Di^ problem (3.1), then including it does not change the extreme value of (3.1); 
that is, every redundant data point is non-binding, and every binding data point is 
relevant. The converse implications, however, are false: in general, there are data 
points that do change the feasible set for the optimization problems for Dk and P, 
but do not change the extreme values. See Figure 5.2 for some illustrations based 
upon the earlier Example 4.2. See also Figure 5.3, which illustrates the set of all 
second data points (z2, G{z2)) G [0, 1] x M that are redundant with respect to the 
first data point from Example 4.2. 

A sufhcient (but not necessary) condition for the extreme value of an optimiza- 
tion problem to be unchanged upon the introduction of a new constraint is that the 
extremizer of the original problem is feasible with respect to the new constraint. 
This, a sufficient condition for a data point to be non-binding is provided by the 
following result: 

Proposition 5.4 (Non-binding data points). Let O C X, zq ^ X , L and G\au{zo} 
be given. 

(1) Let {x,y,x' ,y') be a maximizer for (3.1) with observations O. Lf 

\y-G{zii)\<dL{x,zt^) and \y' - G{zo)\ < dLix',zo), (5.7) 

then zq is non-binding and Dk[X ,G\o\j{zq}t L] ~ Dk[X ,G\o,dL\. 

(2) Let (xq, ii, ?/,p) be a maximizer for (4.12) with observations O. Lf 

\ye - G(zo)| < dL{x„ zo) for all e £ {0, 1}^, (5.8) 

then 2o is non-binding and P[X, G\Qu^za}i L,m\ — P[X,G\o,L,m\. 

Proof. Since O C O U {zq}, every {x,y, x' ,y') that is feasible for (3.1) with obser- 
vations O U {zq} is also feasible for (3.1) with observations O. Hence 

Dk[X,G\o,dL] > Dk[X,G\ou{zo}:L]- 

Now let (x, y, x' , y') be a maximizer for (3.1) with observations O and suppose that 
(5.7) holds; then {x, y, x', y') satisfies the criteria to be a feasible point for (3.1) with 
observations O U {zq}, and has the same objective function value \y — y'\. Hence, 

Du[X,G\o,dL]<Dk[X,G\o^{,,},L], 
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(a) (Non-unique) maxi- 
mizer for the probability 
of failure with one data 
point at (|, |). 



(b) A non-binding new 
data point; the maxi- 
mizer does not change. 
Cf. Figure 5.3(a). 



(c) A non-binding new 
data point; the maxi- 
mizer changes but the 
maximum value does not. 



1.0 T 




0.5 1.0 



(d) A binding new data 
point: the maximizer 
and maximum value both 
change. 



1.0 



0.5 








0.5 1.0 



(e) Two binding new 
data points that together 
render failure impossible. 



Figure 5.2. Additional binding and non-binding data points for 
the one-dimensional Example 4.2. As before, black dots show data 
points and white dots the locations of maximizing (a;o,J/o) and 
(xi,,0,withP=(l-^)^. 

and the claim for Dk follows. The proof of the claim for P is analogous. □ 

Note well that the converse of Proposition 5.4 is false in general: the introduction 
of a new data point may render some of the previous (non-unique) maximizers 
infeasible but still fail to change the maximum value of the problem. 

Nevertheless, the simple algebraic conditions of Proposition 5.4 suggest a practi- 
cal method for calculating Dk or P if the data set O — {zi, . . . , z^} is a large finite 
set that is believed to contain many redundant points. The idea is to introduce 
the data points one at a time and only solve (3.1) (for Dk) or (4.12) (for P) when 
strictly necessary. In the following algorithm, Oi C O will denote the data points 
(constraints) that are enforced at iteration i, while Oi C O will denote those that 
are potentially binding and will be checked for feasibility at iteration i. Note well 
that, in general, Oi U C 0. 

Algorithm 5.5. Initialize with Oq = and Oq = O. Then, for z = 1, 2, . . . , 
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0.5 ■. .■' .• 
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(a) (^,G(2)) = (|,i) 



1.0 -n 



0.5 ■■. .■■ .• 




\- 1 H 

0.5 1.0 

(b) {^,G(^)) = (|,i) 



1.0 -r 



0.5 



1.0 -r 



0.5 



-f^ 



0.5 



1.0 



(c) (^,G(^)) = (|,1) 



0.5 



1.0 



(d) (^,G(^)) = (i,|) 



Figure 5.3. In grey, those locations for the second data point in 
the one-dimcnsional Example 4.2 that are non-binding with respect 
to the first point (the black dot); c/. (a)-(d) of Figure 4.2. 



(1) For each z S calculate G|c)._jy{2}, L]. 

(2) Let C C'j_i be the set of maximizers of 2; I— J> G|c);_ju{z}j among 

(3) Set a, := UX and calculate ,G\o,,L\. 

(4) Let Oi consist of those z S 0\0i such that the extremizer for [X, G\oi,L] 
is infeasible with respect to (z, G{z)) {i.e. fails (5.7)), and hence is possibly 
binding. 

(5) Terminate if Oi = 0. 



The algorithm for P is analogous, with (5.8) in place of (5.7). 

In the numerical examples that have been considered so far, it has been observed 
that relatively few elements of O determine Dk or P, even though, in principle, 
every element of O could supply a binding constraint. This situation is somewhat 
analogous to the simplex algorithm in linear programming: in the theoretical worst 
case, the simplex method can take exponential time [15], but it "usually" requires 
polynomial time in practice. We will reserve detailed numerical analysis of this 
algorithm for a future work. 
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6. Further Remarks 

6.1. Feasible Lipschitz Constants. Given O 'Z X and the associated obser- 
vations G\o, let Lip(G|c)) denote the set of Lipschitz constants for G that are 
consistent with the observations G\o, i.e. 

L.p(G|„):^{LeR-||^J«»'>;;|';^,°;^_^,,}. (6.1) 

It is easy to check that, for any given OCX and G\o, Lip{G\o) is a convex subset 
of M^. This remains the case if additional inequality constraints on the are 
supplied: e.g. if it is required that < L^. < , then 

Lip'(G|e,) := {L e Lip(G|o) | < ife < 4 ^r each k€{l,...,K}} 
is a convex set. 

It is not immediately clear what one should regard as the "smallest" element of 
Lip{G\o). However, recall that Theorem 3.3 shows that the gap size of the data 
set with respect to controls the error Dk ~ T^k [G] : 

< Dk[X,G\o,dL]-Vk[G] < Ar{X,0,dL). 

Therefore, it makes sense to search among the feasible Lipschitz constants L e 
Lip(G|c)) for one L* that minimizes the gap size. Unfortunately, this is not a 
convex minimization problem in the sense of [7, §4.2], since r(A', O, d/,) is not a 
convex function of L: for each x G X, dL{x,0) is a concave function of L, and a 
supremum of a family of concave functions can be badly behaved. Dk[X, G|o, d^.] 
is then the upper bound on ^'^[G] that has the tightest error estimate that can 
be justified by the data G\o alone; of course, further data might invalidate this 
scenario. 

6.2. Sensitivity and Robustness Analysis. In some applications, there may 
be doubt about the correct values for the Lipschitz constants Li, . . . , Lk . Such 
doubt necessarily propagates to doubt about the validity of the bounds Dk and 
P: however, it does not do so in an entirely uncontrolled fashion. It is possible to 
perform a (local or global) sensitivity/robustness analysis of Dk and P with respect 
to Li, . . . , Lji and thereby determine which Lipschitz constants strongly control the 
values of Dk and P; the key Lipschitz constants can be identified for further, more 
detailed, research; the less important ones can be (relatively) safely accepted as 
they stand. 

Notably, as in the optimal concentration-of-measure inequalities of McDiarmid 
and HoefFding type [27, §4], some Lk may turn out to have zero infiuence on Dk 
and P. Indeed, by rescaling arguments, it is easy to see that just as Dk and P may 
be discontinuous as functions of the observed data G\o (as in Example 4.2), Dk 
and P may be discontinuous as functions of L. 

7. Numerical Examples 

This section covers the numerical calculation of P in two example cases. The 
first case (Subsection 7.2) is a validation exercise, in which the closed-form results 
of Example 4.2 are replicated numerically. The second case (Subsection 7.3) is a 
more involved calculation, in which the response function is a function of three 
variables and the data set comes from an archive of mechanical experiments. 
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7.1. Overview of the Numerical Method. A description of tlie OUQ algoritlim, 
as implemented in the mystic framework [21], can be found in [22, 23]. In those 
earlier implementations of OUQ, it was the case that the response function was 
known/modelled exactly, and so it was only necessary to numerically represent the 
unknown probability measure fj,. To implement the "Legacy OUQ" method of this 
paper, it was necessary to extend the existing OUQ algorithm in the following ways: 

• Mystic's product_measure class, which provides a numerical representation 
of a probability measure fi of the form (4.5)/(4.6), was extended to associate 
to each of the support points of a product measure /i a scalar value, thereby 
providing a numerical representation of a pair (g, /i) g as in (4.8). Such 
an object will be referred to as a scenario and denoted X; typically, X 
is stored in the "compressed" form of (xq, y) as used in (4.12) and 
elsewhere, but is sometimes converted into other representations. 

• A dataset class, which numerically represents the observed data G\o and 
the cone structure that comes from the Lipschitz constants, was added. 
As alluded to in the previous bullet point, a scenario object X can be 
regarded as a dataset object by "forgetting" the probabilistic structure and 
remembering only the points in input parameter space and their associated 
output values. Below, the legacy data set G\o will be denoted data. 

• Methods were added to both of these classes to allow for efficient calculation 
of cIl distances (and hence whether or not a given scenario object X is d^- 
short with respect to itself and data) and integrals with respect to /i as in 
(4.7). 

The overall structure of the optimization calculations is that of an outer and an 
inner optimization loop. The outer loop generates the next population of candidate 
scenario objects X to which the objective function F (the probability-of-failure 
functional) will be applied. The inner loop applies the constraints (bounds, mean, 
and shortness) to those generated candidates X so that F is only ever evaluated on 
scenario objects X'=C(X) that satisfy the constraints imposed by C. 

The outer optimization loop, as described in [22], is used with the "expanded 
solver interface" described in [2.3]. A differential evolution solver [28, 32] was used 
with termination condition ChangeOverGenerations, population size npop = 32, 
ngen = 100, and tol = 10~^; that is, the calculations used populations of 32 
candidates and terminated when the best objective function value had shown no 
improvement greater than 10~^ for 100 consecutive iterations of the outer loop. 
The objective function value F(X), when X represents (5,/z), is the probability of 
failure for g under /x as defined in (4.7) with r{x) := 1[g{x) < 9]. The optimizer 
generates values for the weights and positions of the measure points in each coor- 
dinate direction. For Legacy OUQ, the optimizer must also generate scalar values 
y = g{x) for each point x in the support of the product measure /j,. 

In mystic, constraints are solved explicitly through algebraic or numerical means. 
A constraints solver C is built to impose the set of constraints on the candidate 
scenario generated by the outer loop optimizer at each iteration. Constraints 
solvers are functions that map any (not necessary feasible) scenario object X to a 
scenario object X'=C(X) that satisfies all of the required constraints. Thus, only 
valid solutions to the constraints equations are seen by the objective function F. 
Effectively, the value of the objective function value evaluated by the outer loop 
optimizer at each step is F(C(X)). In contrast, standard optimizers use penalty 
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functions P (and often dynamic multipliers k) so that the objective function F 
as evaluated by the optimizer is in fact F(X)+k*P(X); this approach corrupts the 
structure of the problem by severing an explicit connection to the constraints. 

The constraints function used in the Legacy OUQ algorithm first builds the 
scenario object X from the optimizer-generated inputs to the objective function. 
A first constraints solver C is then applied: this ensures that the weights of each 
of the underlying discrete measures sum to 1.0. A second constraints solver C ' is 
then applied, which imposes the mean constraint £^[17] > m; this is done through 
mystic^s impose_mecLn function, which, in our example, shifts the coordinates of X so 
that X has the desired mean. At this point, the candidate scenario objects X gener- 
ated by the optimizer have passed through the constraints solvers C ' and C ' ' , and 
only provide the objective function F with valid solutions X'=C*(X)=C' ' (C (X)) 
of the given bounds and mean constraints. If the resulting candidate scenario 
object X' is not d^-short with respect to itself and to the legacy data data, i.e. the 
inequality 

\g{x) - g{x')\ < dL{x,x') 

fails for some x in the support of /j and some x' either in the support of fi or in 
O, then mystic^s set_f easible function is used in a third constraints solver C to 
impose the desired shortness on the scenario object X'. Unlike for C and C ', 
the constraints in C can not be imposed algebraically. Instead, the application of C 
is an inner optimization loop. 

The details of how mystic checks for shortness and how feasibility is imposed on 
a scenario object are worth a little further discussion. 

The check for shortness of a scenario X with respect to the legacy data data 
is done by first converting X into a dataset object with the load method, and 
then applying the is_short function, which calculates the a 2-dimensional array 
dist with elements \y — y'\ — d^^x^x') for each combination of x,x' from the two 
collections of support points (here, the legacy data set data and the scenario X 
regarded as a data set). The result is a matrix corresponding to the distances 
required for shortness, where all distances less than a given tolerance short_tol 
are treated as acceptably close to zero; if all entries of the matrix dist are at 
most short_tol, then, modulo that tolerance, X is d^-short with respect to data; 
otherwise, the positivity of the matrix dist provides a numerical measure of the 
failure of shortness. Shortness of X with respect to itself is calculated similarly. 

Shortness is imposed through an inner optimization loop that solves for a can- 
didate scenario object X' for which dist<=short_tol. Similarly to the outer 
optimization loop, this inner optimization loop uses a differential evolution solver 
— however, the termination condition used in the inner loop is VTR [21], and 
solver parameters are set to npop = 40 and tol = 10^^. The constraints solver 
C* described above is reused by the inner optimization loop to ensure that the 
constraints on the weights and mean are also respected by C. For shortness, the 
objective function for the inner loop is the sum over all elements of the ma- 
trix max (0.0, dist-short_tol) . When the inner loop terminates, a candidate 
scenario object X'=C(X) is produced that satisfies all constraints imposed by the 
solver C (and thus also C*). 

The solution produced by the outer optimization loop is a scenario object C(X) 
that both satisfies all of the above constraints and maximizes the probability of 
failure F(C(X)). 
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Figure 7.1. Log-linear plot illustrating typical numerical conver- 
gence of the approximate maxima Pnum as a function of algorithmic 
iterations in the numerical implementation of Example 4.2. Note 
the approximate convergence rate, error « 10^'^+"/^''). After the 
last iteration shown in each plot, |Pnum — -P| < 10^^, i.e. the two 
are equal up to the convergence tolerance. 

7.2. One Data Point in One Dimension. As a first exercise in applying the 
protocol, we numerically replicate the exact values for P in Example 4.2. Numerical 
convergence plots are given in Figure 7.1. 

It may be useful to note that the dimensionality of the problem can be slightly 
reduced, and more accurate results obtained more quickly, if instead of searching 
over 

ixo,xi,yo,yi,p) & [0,1]^ x x [0,1], 

one instead forces (xi, yi) to be a failure, and therefore searches over 

{xo,xi,yo,yup) £ [0,1]^ x E x {0} x [0,1]. 

The same value for P is attained using either approach; if y = is not a feasible 
value for any x £ [0, 1], then the optimizer detects this fact and reports that the 
feasible set is empty, from which we infer that the maximum probability of failure 
is zero. 

7.3. Three-Dimensional Example. This subsection reports the results of imple- 
menting the above method for obtaining optimal bounds on probabilities using a 
data set generated by physical experiments. 

In these experiments, a steel ball of diameter 0.07 inches is fired at an aluminium 
plate of thickness h (measured in inches). The projectile impacts the plate at 
an angle a (measured in degrees) away from the plate normal, and at a speed 
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Figure 7.2. Numerical results for the least upper bound on 
P[G(ft., a,w) < B\ for various Q. Note the close agreement with 
the Markov bound (7.2) (grey line): for Q > 2.0 mm^, the dif- 
ference is less than the change-over-generations criterion of 10~^. 
Convergence plots are given in Figure 7.4. 

V (measured in metres per second). After the impact event, the area G{h,a,v) 
(in square millimetres) of the perforation in the plate caused by the impact event 
is measured and recorded. The results of a series of such impact tests are given 
in Table 9.1; this forms the legacy data set G\o for this example. The protocol 
described above is now applied over the parameter space 

{h, a,v) ex [0.062, 0.125] in x [0, 30] deg x [2300, 3200] m/s 

and using the data points in Table 9.1. The data are, in fact, multi- valued (two 
distinct perforation areas were observed for the same input triplet {h, a, v)). There- 
fore, the response function is not Lipschitz continuous, and so we apply a natural 
generalization of the above protocol using the following "Lipschitz with tolerance" 
constraint: 

\G{h, a, v) — G{h' , a\v')\ < dL{{h,a,v),{h' ,a' ,v')) -\-T, (7.1) 

where 

L := {Lh,La,Lv), T:=1.0mm^, 

Lh ■■= 175.0 mm^/in. La := 0.075 mm^/deg, := 0.1 mm^/(m/s). 

Condition (7.1) is satisfied by the observed data in Table 9.1, and we assume that 
it remains valid for the system in operation. We also assume that the system in 
operation will be exposed to random {h, a, v) taking values in X, with independent 
components, and such that E[G{h,a,v)] > 11.0 mm^. 

In this example, the "failure" event is that the perforation area G{h,a,v) falls 
below some threshold area 0. Figure 7.2 shows the computed least upper bound on 
¥[G{h, a,v) < 0] for S {0, 1, . . . , 12} mm^. As expected, the least upper bound 
on P[G{h, a, v) < 0] is indeed 1 when > m and decreases as m — 9 increases. 

Remark 7.1 (Markov bound and non-binding data). One interesting feature of 
Figure 7.2 is that the numerical results demonstrate very close agreement with the 
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Markov bound 



¥[G{h,a,v) <e]< 



M - m 



(7.2) 



where 



M := sup M {G{z) + dL{z,{h,a,v)) +T) ^ 39.895mm^ 

{h,a,v)eX ^'^^ 



(7.3) 



with maximizer at 



{hM,OiM,VM) ~ (0.062 mm, 0.0 deg, 3138.6 m/s) 



(7.4) 



is the largest perforation area that can be reahsed anywhere in X subject to the 
data and the Lipschitz constraints. (We note in passing that efficient algorithms 
for finding extrema of Lipschitz functions are an area of independent interest: see 
e.g. [12].) Indeed, for 9 > 2.0 mm^, the difference between the computed P and 
Markov's bound is dominated by the numerical convergence criterion (less than 
10~^ change over 100 consecutive generations). 

This observation shows that most of the data set {i.e. those data points that 
do not determine M) consists of non-binding data points; indeed, only the con- 
straints corresponding to data points A54 and A67 in Table 9.1 hold as equalities 
at {{hM,ctM,VAi), M). Put another way, the other 30 data points carry no infor- 
mation about P, and could have be ignored. Also, this finding suggests that the 
best next experiment to reduce P would be to determine G{hM,ctMiVM), since if 
G{hM,C(M,VM) <^ M, then P will decrease considerably. 

However, for 9 < 1.0 mm^, a significant difference 10~^ or greater) is observed 
between the computed P and Markov's bound, and this suggests that data points 
other than A54 and A67 supply relevant data in these cases, and that it is no 
longer feasible to have all the /i-probability mass located at {{hM,ctM,VM),M) 



Remark 7.2 (Dimensional collapse). The approximate maximizers in this problem 
appear to undergo a kind of "dimensional collapse", as illustrated in Figure 7.3. 
The extremizing measure fi does not have support on the 8 distinct points of a non- 
degenerate discrete cube C{xo,xi); instead, the support of the measure collapses 
to one point in the h and a marginals. This indicates that the uncertainty in the 
impact velocity v is the dominant uncertainty in this problem. 

Furthermore, once this "dimensional collapse" phenomenon has been observed, 
even approximately, it makes sense to try the calculation of P using 1x1x2 product 
measures instead of 2 x 2 x 2 product measures; this approach always produces valid 
lower bounds on P and, as Figure 7.4 shows, can greatly reduce the computational 
burden. In this way, lower bounds on the solution of a large OUQ problem can be 
found relatively quickly by considering lower-dimensional sub-problems. 



8.1. Additional Statistical Information. The approach of Section 4 is open to 
a great deal of generalization, much more so than that of Section 3. In principle, 
any information about G and P can be used to define a set of admissible scenarios 
A for the optimization problem (4.1)-(4.2); also, the objective function can be more 



and {{h',a',v'),9). 



8. Generalizations 
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Figure 7.3. Illustration of the dimensional collapse phenomenon 
for the approximate maximizers for 9 = 9.0 mm^ in Figure 7.2. The 
first three rows show the /z-probability (left column) and position 
(right column) of the h, a and v coordinates of the support of fi. 
The bottom-left figure shows the y-values, and the bottom-right 
the negative of iJ,[g{X) < 9], i.e. —P. In the later iterations, /x is 
effectively a 1 x 1 x 2, not a 2 x 2 x 2, product measure. 
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Figure 7.4. Log-linear plot illustrating typical numerical conver- 
gence for the approximate maximum Pnum for 9 = 9.0 mm^ in 
Figure 7.2 at full 2x2x2 dimensionality and reduced 1x1x2 
dimensionality. Note the improvement to the convergence rate ob- 
tained by operating at reduced dimensionality. 



general than the probability of failure. Let r : X ^ M.he measurable. As shown in 
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Figure 8.1. For i e {1,2}, bounds on the expected value of Xi 
can be propagated through a system G; that is known on Oi and 
has Lipschitz constant Li to yield optimal bounds on the expec- 
tation of some output quantity Yi. The bounds on Y\ and Y2 can 
then be propagated through a third system Gs, and so on. 

[2'i], if A is described by independence constraints and inequalities of the form 
E4(^^]<0, forie{l,...,n'}, 
E^. ^] < 0, for e {1, . . . , i^}, I e {1, . . . , rife}, 

for given measurable functions : X and (^^ : — i> K, then, to extremize 
E^[r] over /i G ^, it is sufficient to search over measures /i = (S'lLi Mfe G with /i^ 
supported on at most n' + + 1 points of ; this paper made use only of the case 
T = 1[/ < Q\i n' = 1, (fi'i = m — /, Uk = 0. In particular, independence assumptions 
can be relaxed, and information about the moments and correlations of the input 
random variables Xk can be included in the definition of A. If such information is 
used, then a reduced upper bound on the probability of failure is obtained, but at 
the cost of solving a higher-dimensional optimization problem. 

Since, in general, the same methods can be used to provide optimal bounds 
on £^[7"] for any quantity of interest r, the methods of this paper can be used 
to optimally propagate uncertainties through a hierarchy (directed acyclic graph) 
of partially-observed input-output relationships, as in [ ]. See Figure 8.1 for a 
schematic illustration. 



8.2. Measurement Uncertainty. Bounded measurement uncertainty can also be 
incorporated in the inequality constraints. More precisely, suppose that an error of 
up to ±(5 is associated to the observed value G(z), and an error of up to S' with 
respect to the metric is associated to the corresponding input parameter value 
z. Then the observed datum is not (z, G(z)) but rather some (z, G(F)) S A" x M 
such that 



dL{z,T) < 6' and G(z) - G(F) 
In this situation, the Lipschitz constraints of the form 

\y-Giz)\<dLix,z) 



< S. 



(8.1) 
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generalize to 

\y~l\<dL{x,0, (8.2) 
where (C, 7) £ x R is a new optimization variable that plays the role of the 
imperfectly-observed input-output pair (z,G(z)), and, therefore, is constrained to 
satisfy 

dL(C,2) < (5' and 7-G(z) < 6. (8.3) 

Note that, geometrically, (8.2)-(8.3) corresponds to a pointed double cone with a 
movable vertex that must remain close to {z, G{z)) , whereas (7.1) corresponds to a 
fixed and blunt double cone. Note that, as in the simple situation of Example 4.2, 
the bounds Dk and P may be discontinuous as functions of S and 6' . 

If specific statistical information is available about the measurement uncertainty 
{e.g. Gaussian scatter), then confidence intervals can be used in the above proce- 
dure. The resulting bounds on P[G{X) < 6] will be probabilistic in nature, and 
will become looser as the required level of confidence increases. 

8.3. Model-Based Certification. In many applications, although the real re- 
sponse function G: X ^ M. cannot be easily exercised, there may be a model 
F: A" — i> M for G that can be used instead. Quantitative relationships between 
G and F can be used to define sets of admissible scenarios as before. For example, 
suppose that it is known that 

||G - F||oo := sup \Gix) - F{x)\ < Cv, (8.4) 

where Cy > is some constant resulting from an exercise in model validation. 
Then, compared with the admissible set A of (4.2), the corresponding set Af that 
uses also the model F and the information (8.4) is 

{g: A" — > R is d^-short, "| 
(.g,/i) M = G(8)f=i^('^fc), \qA. 

\\g - i^lloo < Gy, g = G on O, and E^[.g] > m J 

Hence, in the ^i?-analogue of the reduced problem (4.12), the model F and (8.4) 
induce additional constraints of the form 



I ye 



F{xe)\ < Cv for each e £ {0, 1}^. 



As remarked above, and P may be discontinuous as functions of Gy. 

Other quantitative measures of model validity can be used in similar ways. With- 
out going into detail, we note that the uniform norm in (8.4) is too strong for many 
applications, particularly those in which F or G may have discontinuities: in such 
cases, ||F — G||oo being small requires that F and G have approximately the same 
discontinuities in R at exactly the same locations in X, which is a very strong 
requirement. Therefore, metrics that allow "wiggle room" in both X and R, e.g. 
the various Skorohod metrics [G, 31], are expected to be of use in this area. For 
example, it may be reasonable to assume that the distance between the graphs of 
F and G as subsets of A" x R is small enough that, for some Gy > 0, 

sup ini max{dL{x,x'),\G{x) ~ F{x')\} <C'y; (8.5) 

i.e. every point on the graph of G lies within distance Cy of some point on the 
graph of F. (Note well that the roles of F and G in (8.5) are not symmetric.) In 
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this case, the corresponding constraint satisfied by any feasible (a;^, j/e) G A" x R is 
that 

inf \y,-F{x')\<C'y. 

dL(xc,x')<Cy 

8.4. Set- Valued Lipschitz Functions. In many applications {e.g. inverse prob- 
lems, which are often ill-posed), the system of interest cannot be accurately repre- 
sented as a single- valued function G: A" — >■ K. For example, the system outcome 
may depend on so-called unknown unknowns, which can be neither controlled nor 
even observed, but have the effect that G{x) is not a uniquely determined real 
number for each fixed x ^ X. One resolution to this problem is to treat G as a 
partially-observed set-valued function G: A" M, i.e. an operation that assigns to 
each X € X a. (possibly empty) subset of M. There is a notion of Lipschitz continu- 
ity for set-valued functions [2]: for metric spaces {X,dx) and (3^,(i3;), a set-valued 
function G: X y is, said to be a set-valued Lipschitz function with Lipschitz 
constant L > if, for all a;, x' G X, 

dist(y,G(x')) := inf dy{y,y')<Ldx{x,x')\, (8.6) 



G{x) C y e 



that is, G{x) is a subset of the uniform L(i;t'(2;, 2;')-neighbourhood of G(x'); or, 
equivalently, the Hausdorff distance between the sets G{x) and G{x') is at most 
Ldx{x, x'). 

It would be an interesting and natural extension of the present work to consider 
set-valued response functions. Indeed, the set of single-valued Lipschitz extensions 
£{X, Glo, di) as defined in (3.2) defines a set-valued function G: X -^M.hy 

G{x) {g{x) \ x G X,g G £(A', G|o, ^l)}. 

G is a set- valued Lipschitz function, with Lipschitz constant 1 with respect to the 
metric d^, and £{X, G|o, d^) is the collection of Lipschitz selections [_', §9.4.3] of G. 
In this paper, since G is assumed to be single-valued, the sets G{x) are all convex; 
in the general situation, this need not be the case. 



9. Appendix: Legacy Data 

Table 9.1 lists the data set that constrains the examples of Subsection 7.3. We 
thank the California Institute of Technology PSAAP Center's Experimental Science 
Group — in particular, M. Adams, J. M. Mihaly and A. Rosakis — for this data 
set. 
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